GRANT  NO:  DAMD17-94 - J-4456 


TITLE:  Computer-Aided  Mammography  Using  Automated  Feature 

Extraction  for  the  Detection  and  Diagnosis  of  Breast  Cancer 


PRINCIPAL  INVESTIGATOR (S) :  Joseph  Y.  Lo 

CONTRACTING  ORGANIZATION:  Duke  University  Medical  Center 

Durham,  North  Carolina  27710 


REPORT  DATE:  October  12,  1995 


TYPE  OF  REPORT:  Annual 


PREPARED  FOR:  U.S.  Army  Medical  Research  and  Materiel  Command 

Fort  Detrick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT:  Approved  for  public  release; 

distribution  unlimited 


The  views,  opinions  and/or  findings  contained  in  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation. 


B60W3 162 


STIC  QSALnY  IH8PBCTED  a 


REPORT  DOCUMENTATION  PAGE 


t 


FonnApptw9d 
Om  No.  OTtMIBa 


1.  ACENCy  USE  ONLY  (!«««« OtenA)  2.  REPORT  OATE  .  1 3.  REPORT  TYPE  AND  0  '  1 

12  Oct  1995  1  Annual  Report  (15  Sep  94  -  14  Sep  95 )| 

4.  TTTU  AND  SUBTITLE 

Computer-Aided  Mammography  Using  Automated  Feature 
Extraction  for  the  Detection  and  Diagnosis  of  Breast  Car 

S.  FUNDING  NUMBERS  - 

cer  . 

DAMD17-94-J-4456 

Joseph  Y.  Lo 

7.  PERFORMING  ORGANIZATION  NAME(S)  ANO  AOORESS(E$) 

Duke  University  Medical  Center 

Durham,  North  Carolina  27710 

B.  PERFORMING  ORGANIZATION 
report  NUMBER 

9.  SPONSORING /MONITORING  AGENCY  NAME(S)  ANO  AODRES^ES) 

U.S.  Army  Medical  Research  and  Materiel  Command 

Fort  Detrick,  Frederick,  MD  21702-5012 

10.  SPONSORING/MONITORING 

AGENCY  REPORT  NUEttER 

IZa.UISTRIBUTION/AVAILABIUTY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

12b.  DISTRIBUTION  CODE 

We  developed  artificial  neural  network  (ANN)  techniques  to  predict  breast  lesion  malignancy 
based  on  mammographic  features  extracted  by  radiologists.  The  3-layer  backpropagation 

ANNs  were  trained  and  tested  using  the  round  robin  technique  and  evaluated  by  ROC 
(receiver  operating  characteristic)  analysis.  Using  all  1 1  available  features  from  206  patients, 
the  ANN  performed  with  ROC  area  Az  of  0.84  ±  0.03,  which  was  not  significantly  different 
from  the  expert  radiologists’  Az  of  0.85  ±  0.03  (2-tailed  p-value  =  0.54).  We  then  ranked  the 
importance  of  individual  features  to  reduce  the  number  of  ANN  input  features.  The  resulting 
6-feature  ANN  had  Az  of  0.86  ±  0.03  which  was  still  not  significantly  different  than  that  of 
the  expert  radiologists  with  p  =  0.34.  The  result  was  an  optimally  simplified  ANN  for 
merging  features  to  predict  breast  lesion  malignancy.  In  the  following  years,  work  will  focus 
on  automated  extraction  of  those  features  to  feed  into  the  ANN  inputs,  thus  producing  a  fully 
automated  computer-aided  diagnosis  system. 

SUBJECT  TERMS  - —————— 

computer-aided  dlag;nos Is » image  processing,  breast  cancer, 
mammography 

IS.  NUMBER  OF  PAGES 

21 

IB.  PRia  CODE 

17.  SEaWIYaASSiF»  IS.  ««»TVaM5«CATION  19.  SECURITY  OASSWCATION 

Or  REBORT  OE  THIS  PAGE  OR  ABSTRACT 

Unclassified  Unclassified  Unclassified 

20.  LIMITATION  OP  ABSTRACT 

Unlimited 

NSN  7S40-01.280-S500 


Standard  Form  298  (Rav.  2-89) 

^■luawi  few  AN«  $«a.  <S9-ia 


Grant  No.  DAMD17-94-J-4456 


FOREWORD 


the  interpretations,  conclusions  and  recommendations  are  those  of 

e  author  and  are  not  necessarily  endorsed  by  the  U.S,  Army. 

to  uoe'suih  h,o  been  obtained 

(  )  Where  material  from  documents  designated  for  limited  d^«^tT-^Kl,^•i/^„ 

te  quoted,  petmiaeion  bee  been  obtained  to  uee  Vhe  ^terial  0”trabutuon 

report  l^no't^c^n^Uute  rnTmtfal'’DXarSenri£“?be 

approval  of  the  producta  or  aervic^a  L  theaT“i.nfaatiLa“^  endoraement  or 
to  the^  "Guide"  ?oTtS''?Lr?nd%^se“^^^^^  the  investigator  (s)  adhered 

Coinmittee  on  Care  and  Use  of  Laboratory  AniiSs  of^th^^Sstitut^Tf  ^Lab 
ReJi^ed  1985)*!"®^'  National  Research  Council  (NIH  Publication  No.  °8  6-23! 

edbere^  ri  Po“oi“%ra‘ricL"leVrrS  fil 

investigator^(s)"Tdhe^e<?^t''®^®^  utilizing  recombinant  DNA  technology,  the 
institutes  of  Heai?S  ci^rrent  guidelines  promulgated  by  the  National 


PI:  Joseph' Y.  Lo,  Ph.D. 


Table  of  Contents _ _ _ 

1.  Introduction . 5 

1.1.  Significance  of  diagnostic  problem . 5 

1.2.  Potential  of  the  proposed  technique . 5 

1.3.  Computer-aided  diagnosis  using  artificial 

neural  networks . 6 

1.4.  Technical  Objectives . 6 

2.  Body . 8 

2.1.  Data  preparation . . 8 

2.2.  Backpropagation  neural  network  architecture . 10 

2.3.  Backpropagation  training  algorithm . 11 

2.4.  Optimized  reduction  of  input  features . 13 

2.5  Other  technical  objectives . 16 

3.  Conclusions . 17 

4.  References . 18 


p.4 


PI:  Joseph'Y.  Lo,  Ph.D. 


1.  Introduction 


1.1.  Significance  of  diagnostic  problem 

In  the  U.S.  in  1994,  there  were  approximately  182,000  new  cases  and  46,000 
deaths  due  to  breast  cancer,  making  it  second  only  to  limg  cancer  as  the  cause  of 
cancer  death  among  women  [1] .  Mammography  is  the  modality  of  choice  for  early 
detection  of  breast  cancer  and  can  significantly  decrease  the  mortality  for  women 
imdergoing  screening  [2,3] .  Evaluating  mammograms  remains  a  challenging  task  to 
radiologists,  however,  as  they  consider  many  radiographic  and  non-radiographic 
features  in  order  to  decide  whether  a  lesion  is  benign  or  whether  it  should  be 
followed  or  biopsied.  Although  mammography  is  very  sensitive,  there  are  a  large 
number  of  false-positive  biopsies.  Of  women  with  radiographically-suspicious, 
nonpalpable  lesions  who  are  sent  to  biopsy,  only  15  to  34%  actually  have  a 
malignancy  by  histologic  diagnosis  [4,5] . 

1.2.  Potential  of  the  proposed  technique 

This  study  seeks  to  improve  the  diagnosis  and  treatment  of  breast  cancer  by 
reducing  the  cost  and  morbidity  of  uimecessary  biopsies.  Cost  is  a  major  obstacle  to 
widespread  acceptance  of  mammographic  screening  [6].  It  has  been  shown  that 
surgeon's  fees  and  biopsy  costs  accoimt  for  over  half  the  cost  of  detecting  small 
breast  cancers  in  a  screening  population  [7] .  Preventing  unnecessary  biopsies  is 
therefore  one  of  the  most  important  ways  to  improve  the  efficacy  of  mammographic 
screening.  Many  previous  reports  have  discussed  the  need  to  reduce  the  number  of 
benign  biopsies  [8,9] . 

To  improve  early  diagnosis,  we  propose  an  automated  computer-aided 
diagnosis  (CADx)  system  for  mammography.  The  system  will  perform  automated 
feature  extraction  from  mammograms  using  artificial  neural  network  (ANN)  and 
other  image  processing  techniques,  then  predict  the  outcome  of  biopsy  (benign  vs. 
malignant).  The  intent  is  to  identify  probably  benign  lesions  for  which  biopsies  may 
be  spared.  This  study  will  potentially  provide  an  accurate,  consistent  aide  for  the 
early  diagnosis  of  breast  cancer. 

A  successful  mammography  CADx  system  consists  of  two  stages: 

(1)  automated  extraction  of  various  features  from  the  mammogram,  potentially  by 
different  methods  appropriate  for  each  task,  and  (2)  accurate  "merging"  of  those 
features  by  computer  algorithms  to  produce  the  diagnosis.  Automated  feature 
extraction  can  improve  the  accuracy,  specificity,  consistency,  efficiency,  and 
accessibility  of  breast  cancer  diagnosis. 
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1.3.  Computer-aided  diagnosis  using  artificial  neural  networks 

In  medical  imaging,  CADx  systems  provide  radiologists  with  information 
from  computerized  analysis  of  images  or  image  features,  thus  helping  radiologists 
detect  or  diagnose  diseases  more  accurately,  easily,  and  consistently  [10,11].  CADx 
has  been  applied  to  such  varied  problems  as  interstitial  disease  [12,13],  cardiomegaly 
[14] ,  pneumothorax  [15],  limg  nodules  [16-18],  nuclear  medicine  lesions  [19],  and 
puhnonary  embolism  [20] .  In  mammography  specifically,  there  have  been  numerous 
reports  on  computerized  detection  [21-27]  or  diagnosis  [28-33]  of  breast  cancer. 
Although  both  are  generally  considered  CADx  systems,  detection  systems  locate 
suspicious  lesions  in  an  image,  while  diagnosis  systems  such  as  this  study  determine 
whether  those  lesions  are  benign  or  malignant. 

This  study  focuses  on  the  use  of  artificial  neural  networks  (ANNs)  which  are 
computer  models  inspired  by  the  structure  and  function  of  biological  neural 
networks,  such  as  the  cerebral  cortex  of  the  human  brain.  Most  ANNs  are 
characterized  by  multiple,  simple  computing  elements  or  neurons  that  work  in 
parallel.  The  neurons  interact  globally  through  coimections  that  have  strengths  or 
weights,  and  together  they  can  duplicate  aspects  of  human  intelligence  while 
incorporating  the  processing  power  of  computers  [34] .  The  classification  rules  are 
not  defined  a  priori,  instead  the  network  is  trained  by  presenting  it  with  medical 
findings  and  final  diagnoses  from  many  patients.  The  network  "learns"  by  adapting 
its  weights  to  improve  its  diagnosis  for  each  patient,  just  as  physicians  become  more 
experienced  with  time.  Once  trained,  the  network  can  generalize  to  new  patients  it 
has  not  seen  before. 

ANNs  are  very  useful  in  handling  complex  decision  tasks  such  as  those 
involved  in  medical  diagnoses,  where  multiple  findings  are  subtly  related  in  ways 
which  are  often  difficult  to  express  in  the  form  of  diagnostic  criteria.  The  networks 
can  capture  such  relationships  between  the  input  findings  to  generate  robust  outputs. 
ANNs  solve  problems  empirically  without  requiring  any  prior  knowledge  of 
distribution  functions  or  any  type  of  statistical  modeling,  yet  ANNs  are  able  to 
duplicate  solutions  of  statistical  methods  [35].  Finally,  ANNs  are  always  consistent, 
for  they  are  not  prone  to  human  fatigue  or  bias. 

1.4.  Technical  Objectives 

The  technical  objectives  pertaining  to  the  first  budget  period  are  aims  la,  lb, 
and  2a  from  the  list  of  aims  for  the  entire  budget  period  shown  below: 

(1)  Identify  an  optimal  subset  of  features  that  would  provide  adequate  diagnostic 
performance. 

la.  Retrain  the  features-to-diagnosis  ANN  using  sub-groups  of  features.  The 
goal  is  to  maintain  the  sei\sitivity  of  the  original  network  while  keeping 
specificity  reasonably  high. 

lb.  Encode  the  multiple-value  features  into  binary  "sub-features",  then  repeat 
step  la  to  reduce  the  number  of  sub-features.  The  sub-features  will  be 
easier  to  extract  by  automated  schemes. 
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(2)  Investigate  conventional  and  ANN  methods  for  extracting  the  optimal  subset 

of  features  directly  from  mammograms. 

2a.  Implement  established  techniques  which  have  demonstrated  promise  for 
extracting  features  belonging  to  our  reduced  feature  set. 

2b.  Investigate  several  ANN  techniques  for  feature  extraction,  focusing  on 
features  which  may  be  difficult  to  classify  by  conventional  techniques  in 
step  2a.  For  both  2a  and  2b,  evaluate  these  techniques  by  comparing  the 
extracted  features  against  radiologists'  findings. 

(3)  Evaluate  the  automated  CAD  system  clinically. 

3a.  Implement  the  CAD  system  by  feeding  the  best  feature  extraction 

techniques  from  step  2  into  the  best  features-to-diagnosis  ANN  from  step 
1,  and  compare  the  resulting  diagnosis  against  the  biopsy  result. 

3b.  Evaluate  the  accuracy  of  the  CAD  system  retrospectively  by  using  patient 
records  from  our  computerized  mammography  database. 


Figure  1.  Time  line  for  proposal  project  period. 

In  the  following  sections,  we  will  report  in  detail  on  the  progress  in  aim  la. 
As  will  be  explained,  aim  lb  will  not  be  pursued.  The  preliminary  results  of  aim  2 
will  be  presented  at  a  national  conference  during  the  second  budget  period  [36],  and 
discussion  of  those  results  will  be  reserved  for  the  second  annual  report. 
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2.  Body _ 

In  the  proposal,  we  reported  some  preliminary  results  from  an  artificial  neural 
network  (ANN)  which  predicted  breast  lesion  malignancy  based  on  mammographic 
findings  extracted  by  radiologists  [37] .  The  mammographic  findings  in  that  study 
were  encoded  using  a  lexicon  which  was  at  that  time  standard  at  our  institution. 

Since  then  our  institute  has  adopted  a  nationally-standardized  lexicon.  We 
investigated  the  use  of  this  new  lexicon  to  take  advantage  of  its  potential  for  general 
applicability  of  the  CADx  system.  This  work  was  presented  at  a  national  conference 
[38]  and  subsequently  published  in  two  parts  in  a  peer-reviewed  journal  [39,40] .  The 
ANN  was  developed  using  206  patients  who  underwent  excisional  biopsy  and 
pathologic  diagnosis.  The  ANN  was  evaluated  by  receiver  operating  characteristic 
(ROC)  analysis  and  its  performance  was  compared  to  that  of  expert 
mammographers.  That  study  was  then  extended  by  identifying  an  optimal  subset  of 
input  features  to  simplify  the  network.  This  work  was  presented  at  two  national 
conferences  [41,42]  and  subsequently  published  in  a  peer-reviewed  journal  [43] . 

In  the  following  sections,  we  will  describe  these  studies  in  detail  in  the 
following  order:  the  data  collection  and  encoding  scheme  (2.1),  the  ANN  architecture 
(2.2),  the  backpropagation  algorithm  used  to  train  the  ANN  (2.3),  the  optimized 
reduction  in  the  number  of  features  (2.4),  and  the  remaining  technical  objectives  (2.5). 

2.1.  Data  preparation 

To  provide  the  examples  for  supervised  training  of  the  neural  network,  the 
mammograms  of  206  women  with  nonpalpable  lesions  were  randomly  selected  for 
prospective  evaluation  from  studies  completed  at  this  institution  from  1991  to  1992. 
For  all  patients,  needle  localization  and  excisional  biopsy  were  completed  and 
histologic  results  were  available.  Of  the  206  lesions  evaluated  there  were  99  masses 
alone,  76  suspicious  calcifications,  and  11  combinations  of  masses  and  associated 
microcalcifications.  The  remaining  20  lesions  included  various  combinations  of 
architectural  distortion,  regions  of  asymmetric  breast  density,  areas  of  focal 
asymmetric  density,  and  areas  of  as5nnmetric  breast  tissue.  Patients  ranged  in  age 
from  24  to  86  years  with  an  average  age  of  55  years.  At  biopsy,  133  (65%)  of  the 
lesions  were  foimd  to  be  benign  while  73  (35%)  were  malignant. 

The  mammographic  findings  were  encoded  using  the  Breast  Imaging 
Reporting  and  Data  System  or  BI-RADS,  a  standardized  lexicon  devised  by  the 
American  College  of  Radiology  to  improve  upon  the  consistency  of  mammographic 
reports  [44,45] .  Each  set  of  films  was  reviewed  prospectively  by  one  of  two 
radiologists  whose  primary  clinical  responsibilities  are  the  interpretation  of 
mammograms  and  the  evaluation  of  breast  lesions  and  who  are  familiar  with  the 
definitions  of  the  BI-RADS  descriptors.  The  radiologist  was  provided  with  the 
cranio-caudal  and  mediolateral-oblique  views  from  both  breasts,  as  well  as  any  other 
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Table  1.  Coding  of  findings  to  network  input  values 


Calc.  Distribution 

no  calcifications 

0 

diffuse 

0.2 

regional 

0.4 

segmental 

0.6 

linear 

0.8 

clustered 

1.0 

Calc.  Number 

no  calcifications 

0 

<5 

0.33 

5  to  10 

0.66 

>10 

1.0 

Calc.  Description 

no  calcifications 

0 

milk  of  calcium-like 

0.07 

rim 

0.14 

skin 

0.21 

vascular 

0.29 

spherical 

0.36 

suture 

0.43 

coarse 

0.50 

large  rod-like 

0.57 

round 

0.64 

dystrophic 

0.71 

punctate 

0.79 

indistinct 

0.86 

pleomorphic 
fine  brancning 

0.93 

1.0 

Mass  Margin 

0 

no  mass 

well  circumscribed 

0.2 

microlobulated 

0.4 

obscured 

0.6 

ill-defined 

0.8 

spiculated 

1.0 

Mass  Size 

mm,  rel.  to  max 

% 

Mass  Shape 

0 

no  mass 

round 

0.25 

oval 

0.5 

lobulated 

0.75 

irregular 

1.0 

Mass  Density 

0 

no  mass 

fat-containing 

0.25 

low  density 

0.5 

isodense 

0.75 

high  density 

1.0 

Location 

axillary  tail 

0 

posterior 

0.2 

middle 

0.4 

anterior 

0.6 

subareolar 

0.8 

central 

1.0 

Associated  Findings 

none 

0 

skin  lesion 

0.11 

hematoma 

0.22 

post  surgical  scar 

0.33 

trabecular  thickening 

0.44 

nipple  retraction 

0.56 

sKin  retraction 

0.67 

skin  thickening 

0.78 

architectural  distortion 

0.89 

axillary  adenopathy 

1.0 

Special  Cases 

none 

0 

intramam.  lymph  node 

0.25 

asym.  breast  tissue 

0.5 

focal  asym.  density 

0.75 

tubular  density  or 

1.0 

solitary  dilated  duct 

Age 

yrs,  rel,  to  max 

% 

available  views  and  films  from  prior  studies.  The  radiologist  was  blinded  to  the 
biopsy  result  and  reviewed  the  films  prospectively. 


The  radiologist  was  asked  to  describe  ten  radiographic  findings  pertaining  to 
lesion  morphology.  The  first  three  are  descriptive  features  that  apply  to 
microcalcifications  and  calcifications  associated  with  masses:  calcification 
distribution,  number  and  description.  Another  four  features  apply  only  to  masses: 
mass  margin,  mass  shape,  mass  density,  and  mass  size.  Three  features  that  can  apply 
to  all  lesions  include  lesion  location,  associated  findings  (e.g.  axillary  adenopathy), 
and  special  cases  (e.g.  asymmetric  breast  tissue).  The  patient  age  was  also  recorded. 
The  radiologist  also  assigned  an  overall  impression  of  malignancy  on  a  scale  from 
one  to  five:  one  =  benign,  two  =  probably  benign,  three  =  indeterminate,  four 
=  probably  malignant,  and  five  =  malignant.  This  estimate  of  probability  for 
malignancy  was  used  only  to  evaluate  the  radiologists'  performance. 

For  quantitative  features,  the  neural  network  input  was  simply  the  numeric 
value,  such  as  mass  size  in  millimeters  normalized  by  the  maximum  mass  size.  In 
comparison,  qualitative  features  were  recorded  in  a  multiple-choice  format. 
Radiologists  selected  one  of  several  possible  descriptors  for  each  feature,  such  as  the 
six  choices  for  mass  margin:  no  mass,  well  circumscribed,  microlobulated,  obscured, 
indistinct,  or  spiculated.  These  feature  descriptors  were  then  coded  into  equally- 
spaced  numeric  values  from  zero  to  one,  as  shown  in  Table  1  above.  The  ordering  of 
the  descriptors  for  each  feature  was  arrived  at  by  discussion  with  experienced 
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mammographers  and  review  of  reports  discussing  the  malignant  potential  of  various 
BI-RADS  descriptors  [45,46] . 

2.2.  Backpropagation  neural  network  architecture 

The  ANN  architecture  we  used  were  three-layer  feed-forward 
backpropagation  networks.  A  typical  network  is  illustrated  in  Figure  2  on  the  next 
page.  For  this  network,  the  input  layer  consists  of  eight  input  nodes  (three  features 
are  excluded  to  simplify  the  figure).  The  inputs  are  fully  connected  to  the  hidden 
layer  which  in  this  case  consists  of  16  nodes.  These  hidden  nodes  are  in  turn 
connected  to  the  output  layer,  consisting  of  a  single  node  representing  diagnostic 
outcome.  The  network's  output  is  compared  to  the  desired  or  target  output,  which  is 
set  to  the  actual  pathologic  diagnosis:  0  for  benign  and  1  for  malignant. 

For  this  features-to-diagnosis  network,  we  use  the  subscript  i  for  the  input 
layer,  j  for  the  hidden  layer,  and  k  for  the  output  layer.  The  weight  connecting  input 
feature  F;  and  hidden  node  Hj  is  Wij,  and  the  weight  cormecting  hidden  node  Hj  and 
the  output  node  or  diagnosis  Dk  is  Vjk .  Since  the  network  has  only  a  single  output, 
the  k  subscripts  are  unnecessary  and  thus  omitted. 


Figure  2.  Preliminary  neural  network  architecture. 

Each  node  in  the  hidden  and  output  layers  calculate  the  weighted  sum  of  its 
inputs  from  the  previous  layer,  add  a  bias  value,  then  pass  the  resulting  sum  through 
a  sigmoid  thresholding  function  (shown  below)  to  yield  an  output  value  between  0 
and  1. 
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This  process  is  illustrated  in  fig.  2  above  by  the  "fan-in"  of  the  darker  lines  (i.e., 
weights  Wi6)  to  the  6th  hidden  node.  The  bias  is  added  and  the  sum  passed  through 
the  sigmoid  function  to  produce  the  output  value  for  that  hidden  node,  Hj,  as  shown 
in  Eq.  2  below. 

r^8  ]  I'm=16  ] 

Hj=f|Ew,iF,+biaSj|  (2)  D  =  f|£v,H,+bias„j  (3) 

The  outputs  from  the  sixteen  hidden  nodes  then  become  inputs  to  the  last  layer, 
consisting  of  the  single  output  node.  As  before,  the  weighted  sum  of  these  inputs  is 
formed,  added  by  the  single  bias  value  biasout/  and  passed  through  the  sigmoid 
function,  as  shown  in  Eq.  3  above.  The  resulting  single  output  is  the  diagnosis  D . 

The  diagnosis  lies  between  0  and  1,  and  may  be  compared  with  a  threshold  (between 
0  and  1)  during  evaluation.  If  the  output  exceeds  the  threshold,  a  malignancy  is 
predicted,  otherwise  the  outcome  is  predicted  to  be  benign. 

2.3.  Backpropagation  training  algorithm 


The  "knowledge"  of  each  ANN  was  contained  in  its  weights,  which  were 
initialized  to  small  random  numbers  (uniformly  distributed  between  0.3  and  -0.3). 
The  network  "learned"  by  using  the  generalized  delta  rule,  whereby  it  adapts  those 
weights  over  many  presentations  of  training  cases  or  iterations,  using  a  gradient 
descent  technique  to  minimize  the  mean  squared  error  between  the  network  and 
target  outputs. 


Using  the  roxmd  robin  or  "leave  one  out"  technique,  the  network  was  trained 
on  all  but  one  of  the  examples  for  a  fixed  number  of  iterations,  then  tested  on  the  one 
excluded  example  [33] .  The  excluded  example  was  replaced,  the  network  weights 
were  reinitialized,  and  the  training  was  repeated  by  excluding  a  different  example 
imtil  every  example  had  been  excluded  once. 

For  each  training  case  consisting  of  the  input/output  pair  p,  the  network 
output  D  was  compared  against  the  target  diagnosis  Dp,  (set  to  one  if  the  biopsy 
reveals  malignancy  and  zero  if  benign)  to  form  the  error  Dervp: 

Derrp=D(l-D)(Dp-D)  (4) 


Using  this  error  term,  we  calculated  additive  correction  or  "delta"  factors  for  the 
output  node's  weights  and  bias  at  iteration  n: 

AV"  =  77  •  Hj  ■  Den-p  +  a  ■  AV""‘  Abias"„(  =  77  •  Den-p  (5) 

The  delta  factor  for  a  weight  depended  on  both  the  error  and  the  input  which  caused 
that  error.  Both  Tj  and  a  were  proportionality  constants.  The  learning  rate  r\ 
controlled  the  rate  of  convergence,  while  the  momentum  a  enhanced  the  speed  and 
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stability  of  convergence  by  incorporating  part  of  the  old  delta  factor  into  the  current 
one.  The  old  weights  and  bias  were  added  by  their  delta  factors  to  yield  the  current 
weights  and  bias: 

V-  =  v;-'+AV-  bias;.,  =  biaC‘  +  Abias;.  (6) 

Since  there  were  no  target  values  to  compare  against  the  hidden  nodes' 
values,  the  output  error  Derrp  was  backpropagated  to  determine  each  hidden  node's 
error  Hetrj  p. 

Herrj,  =  H,(l-H,)VjDerr,  (7) 

As  before,  delta  factors  were  calculated  for  the  hidden  nodes'  weights  and  biases: 

AWy  =  rj  Fj  HerTj  p  +  a  AWJ"*  AbiasJ  =  77  Hen-j  p  (8) 

After  each  iteration,  consisting  of  one  complete  presentation  of  all  training  cases,  the 
training  and  testing  MSB  were  calculated.  The  testing  MSB  over  the  L=260  cases 
was: 

MSE"=iX(D,-Df  (9) 

^  P 

Also  after  each  iteration,  for  the  L  testing  outputs,  the  sensitivity  and  specificity  over 
a  range  of  decision  thresholds  were  expressed  as  a  receiver  operating  characteristic 
(ROC)  curve.  Performance  of  both  the  networks  and  radiologists  were  measured  and 
compared  by  the  ROC  area  index,  Az,  using  LABROC4  and  CLABROC  software 
(provided  by  Dr.  Charles  Metz,  University  of  Chicago,  Chicago,  IL)  [47,48] .  Large 
area  indices  close  to  1  corresponded  to  high  specificity  and  sensitivity. 

The  optimal  number  of  iterations  was  foimd  by  halting  training  when  the 
testing  MSB  no  longer  decreased,  indicating  when  the  network  had  become 
overtrained,  thus  losing  its  ability  to  generalize  to  new  cases.  Since  the  final  measure 
of  merit  was  Az,  the  effect  of  halting  training  when  the  Az  no  longer  increased  was 
also  investigated.  Minimizing  the  testing  error  almost  always  yielded  the  same 
stopping  criterion  as  maximizing  ROC  area.  The  neural  networks  required  200-1000 
iterations  to  minimize  error.  Maximizing  Az  instead  sometimes  increased  or 
decreased  training  by  a  few  himdred  iterations,  but  never  improved  Az  by  more  than 
30%  of  a  standard  deviation. 

Similarly,  the  number  of  hidden  nodes  were  varied  from  5  to  25  to  optimize 
the  Az.  More  hidden  nodes  would  permit  the  network  to  form  more  complex 
decision  regions  and  become  a  better  classifier.  Too  many  hidden  nodes,  however, 
would  result  in  too  many  weights  than  can  be  reliably  estimated  from  the  limited 
number  of  examples  [49] .  The  network  was  robust  to  variations  in  the  number 
(between  5  and  25)  of  hidden  nodes.  Although  it  performed  best  with  fifteen  and 
worst  with  five  hidden  nodes,  the  difference  in  Az  was  only  1%.  This  trend  was 
t5rpical  of  all  networks  in  this  study.  Therefore  unless  otherwise  noted,  all  of  the 
following  results  were  reported  for  networks  with  fifteen  hidden  nodes. 


PI:  Joseph'Y.  Lo,  Ph.D. 


All  neural  network  software  was  custom-written  by  the  principal  investigator 
in  the  C  language  and  executed  on  SPARC  20  workstations  (Sun  Microsystems,  Inc., 
Mountain  View,  CA).  For  each  run  consisting  of  a  new  combination  of  inputs  and 
hidden  nodes,  the  round  robin  process  of  training  and  testing  required 
approximately  4  hours  without  parallelization. 

2.4.  Optimized  reduction  of  input  features 

Given  all  ten  mammographic  features  and  patient  age,  the  best  11-feature 
network  performed  with  Az  of  0.84  ±  0.03,  which  was  not  significantly  different  from 
the  expert  radiologists'  Az  of  0.85  ±  0.03  (2-tailed  p-value  =  0.54).  This  was  an 
important  result  since  the  network  did  not  have  access  to  all  the  information  that 
radiologists  did,  such  as  mammograms  from  other  views  and  previous  studies, 
clinical  findings,  and  history  findings  other  than  age.  In  a  separate  paper,  we 
demonstrated  further  improvements  by  including  history  findings  in  the  ANN 
inputs,  raising  Az  to  0.89  ±  0.02  which  was  still  not  significantly  better  than  the 
radiologists  (p=0.29)  [39] . 

We  next  sought  to  identify  the  minimal  subset  of  features  which  would  still 
yield  accurate  diagnostic  performance.  There  were  several  motivating  reasons  for 
doing  so.  Fewer  features  would  reduce  the  data-entry  effort  of  radiologists,  which  in 
turn  makes  it  more  likely  that  they  would  incorporate  the  ANN  into  their  standard 
reading  process.  Previous  studies  have  shown  that  community  radiologists  and 
technologists  may  extract  features  as  reliably  as  expert  mammographers,  but  lack  the 
latter's  experience  in  merging  those  features  into  a  diagnosis  [50,51] .  Simplifying  the 
number  of  inputs  may  enable  the  use  of  community  radiologists  or  technologists  for 
feature  extraction,  thus  improving  accessibility  of  expert  diagnosis.  Fewer  inputs 
should  also  permit  reducing  the  number  of  hidden  nodes  and  hence  the  number  of 
ANN  weights,  thus  ameliorating  the  problem  of  overconditioning  due  to  insufficient 
training  cases  [49] .  Finally,  a  simplified  computer  model  may  shed  some  light  on  the 
complex  cognitive  processes  underlying  radiological  diagnosis. 

In  general,  input  features  may  be  eliminated  one  or  a  few  at  a  time,  then  the 
algorithm  retrained  and  retested  to  determine  the  significance  of  the  excluded 
feature(s).  Since  many  features  are  correlated,  however,  this  process  becomes  more 
complicated.  In  other  words,  groups  of  features  may  be  greater  or  less  than  the  sum 
of  their  parts.  Previous  authors  reduced  the  number  of  inputs  by  using  only  those 
inputs  whose  mean  value  differed  greatly  for  benign  vs.  malignant  cases  [33].  Others 
employed  statistical  techniques  such  as  linear  discriminant  analysis  to  identify  an 
optimal  subset  of  inputs  [52].  To  fulfill  the  objectives  of  this  study,  we  employed  a 
new  technique  using  nonlinear  ANNs  to  identify  the  optimized  subset  of  features. 

Since  each  subset  of  features  required  developing  a  new  ANN,  (varying  the 
number  of  hidden  nodes  and  then  performing  a  roimd  robin  for  each),  it  was 
impractical  to  investigate  all  possible  combinations.  Instead,  the  features  were 
ordered  by  their  importance,  then  eliminated  one  by  one  until  network  performance 
was  significantly  degraded. 
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To  rank  the  features  by  importance,  one  feature  was  excluded  and  a  neural 
network  was  retrained  using  all  the  other  features.  As  before,  network  performance 
was  measured  by  Az.  The  process  was  then  repeated  with  a  different  feature  until  all 
features  had  been  excluded  once.  The  assumption  was  that  the  exclusion  of  an 
important  feature  would  degrade  performance  more  than  the  exclusion  of  an 
unimportant  feature.  In  order  from  most  to  least  important,  the  features  were: 

(1)  age,  (2)  mass  margin,  (3)  calcification  description,  (4)  mass  density,  (5)  associated 
findings,  (6)  calcification  distribution,  (7)  lesion  location,  (8)  mass  size, 

(9)  calcification  number,  (10)  mass  shape,  and  (11)  special  cases.  Excluding  the  most 
important  feature  (age)  reduced  Az  from  0.84  to  0.80,  while  excluding  the  least 
important  feature  (special  cases)  did  not  affect  Az  at  all. 


Figure  3.  Reducing  number  of  features 


#  input  features 


Table  2.  Performance  of  ANNs  as 
the  number  of  features  are 
reduced 


#  features 

Az 

a 

P 

11 

0.84 

0.03 

0.54 

6 

0.86 

0.03 

0.43 

2 

0.85 

0.03 

0.83 

radiologists 

0.85 

0.03 

— 

As  ANN  features  are  reduced  from  11 
to  2,  Az  is  comparable  to  radiologists 
(dashed  line). 


For  each  network,  the  Az  and 
standard  deviations  are  shown, 
along  with  the  p-value  for  the 
difference  compared  to 
radiologists.  The  6-feature  network 
is  the  best,  but  all  ANNs  showed 
no  difference  vs.  radiologists. 


Once  ranked,  the  input  features  were  discarded  in  order  from  least  to  most 
important  in  a  manner  analogous  to  backwards  discrimination  analysis,  reducing  the 
number  of  features  to  ten,  nine,  eight,  and  so  on.  Each  simplified  network  was 
retrained  and  re-tested  with  the  round  robin  process  as  before,  and  its  performance 
was  compared  to  that  of  the  expert  radiologists.  The  performance  of  these  simplified 
networks  are  plotted  in  Figure  3  and  summarized  in  Table  2. 
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As  the  number  of 
features  was  reduced  from 
eleven  to  only  two,  the  Az 
barely  fluctuated  with 
changes  of  much  less  than 
one  standard  deviation. 

Even  in  the  extreme  case,  the 
two-feature  network  still 
performed  well  with  Az  of 
0.85  ±  0.03,  which  was  not 
statistically  significantly 
different  compared  to  the 
radiologists  (p  =  0.83).  The 
six-feature  network  emerged 
as  the  best  compromise 
between  minimizing  features 
and  maximizing 
performance.  Its  ROC  area 
of  0.86  ±  0.03  was  not 


False  positive  fraction 

Fig.  4.  ROC  curves  of  6-feature  ANN  vs.  radiologists. 


significantly  different  than  that  of  the  expert  radiologists  with  p  =  0.34.  The  ROC 
curve  of  the  six-feature  network  is  plotted  in  Figure  4  against  that  of  the  radiologists. 


Histograms  of  the  neural  network  outputs  and  radiologist  impressions  for  all 
cases  are  plotted  as  Figures  5  and  6,  respectively.  Note  the  gaussian  shape  of  the 
radiologist  impressions  where  most  cases  were  indeterminate,  compared  to  the 
bimodal  outputs  of  the  computer  models  where  most  cases  could  be  definitively 
diagnosed  as  positive  or  negative. 


0  0.2  0.4  0.6  0.8  I 

ANN  output  Radiologists'  Impression 


Fig.  5.  Histogram  of  ANN  Fig-  6-  Histogram  of  radiologists 

outputs  impressions 
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In  addition  to  high  ROC  area,  it  is  also  important  to  have  good  specificity  at 
the  very  high  sensitivities  required  in  clinical  practice.  We  considered  the  particular 
operating  point  on  each  ROC  curve  where  sensitivity  was  95%,  corresponding  to 
missing  5°/o  of  cancers  of  the  malignancies  identified  by  radiologists.  At  95% 
sensitivity,  the  specificity  of  the  neural  network  (56%)  was  statistically  significantly 
greater  than  the  specificity  of  the  radiologists  alone  (30%)  with  two-tail  p-value  < 
0.01.  In  terms  of  actual  patients  in  this  study,  at  95%  sensitivity  the  ANN  would 
have  missed  3  out  of  73  malignancies  but  prevented  74  of  133  benign  biopsies.  At  the 
portion  of  the  ROC  curves  where  most  mammographers  are  trained  to  practice  (i.e. 
high  sensitivity  and  low  specificity)  the  ANN  maintained  a  very  high  relative 
sensitivity  while  sigmficantly  improving  the  mammographer's  specificity. 

This  work  demonstrated  an  empirical  technique  for  identifying  an  optimal 
subset  of  input  features  to  a  complex,  nonlinear  classification  system.  There  was 
minimal  change  in  network  performance  as  the  niunber  of  inputs  was  pared  from 
eleven  down  to  six,  which  represented  an  optimal  compromise  between  minimizing 
the  number  of  input  features  and  maximizing  performance.  The  ANN's  optimized 
subset  of  features  correlated  well  with  those  identified  by  expert  radiologists  as  being 
among  the  most  important,  although  no  clinician  would  be  willing  to  make  a 
diagnosis  based  on  so  few  findings.  It  was  therefore  all  the  more  remarkable  that  the 
neural  network  outperformed  the  specificity  of  the  expert  mammographers  who 
extracted  the  input  features  in  the  first  place  and  who  also  had  access  to  other 
information  such  as  previous  films  and  the  patients'  clinical  history. 

2.5  Other  technical  objectives 

The  preceding  sections  pertained  to  specific  aim  la  listed  in  the  proposal's 
technical  objectives.  In  specific  aim  lb,  we  originally  proposed  to  separate  the 
features  identified  in  aim  la  into  binary  sub-features,  thus  facilitating  their 
automated  extraction.  Preliminary  work  during  this  first  year  suggested  several 
reasons  not  to  focus  so  specifically  on  individual  sub-features.  First,  analysis  of  the 
distribution  of  the  various  sub-features  revealed  that  some  sub-features  were  present 
in  very  few  patients,  which  would  make  teach-by-example  development  of  ANNs 
difficult.  Second,  the  original  proposal  assumed  that  each  feature  was  separable  into 
discrete,  non-overlapping  sub-features.  Our  studies  demonstrated  considerable 
inter-observer  variation  in  categorizing  each  feature  into  sub-features  [40] .  Finally, 
for  some  features,  the  sub-features  may  be  considered  as  approximate  gradations  of 
the  same  phenomenon.  For  example,  the  mass  margin  ranges  from  circumscribed 
which  is  very  smooth"  to  spiculated  which  is  very  "rough."  For  these  reasons,  we 
deemed  it  imnecessary  to  separate  these  inter-related,  overlapping  sub-features  for 
ANN  development. 

Work  on  specific  aim  2a  was  commenced  in  accordance  with  the  time  line 
shown  in  the  original  proposal.  The  initial  results  will  be  presented  at  a  national 
conference  during  the  second  budget  period  [36],  so  discussion  of  that  work  will  be 
reserved  for  the  second  annual  report. 
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3.  Conclusions 


This  goal  of  this  proposal  is  to  develop  a  computer-aided  diagnosis  system  to 
automatically  extract  radiographic  features  from  the  mammogram,  then  use  an 
artificial  neural  network  (ANN)  to  merge  those  features  to  predict  breast  lesion 
malignancy.  During  the  first  budget  period,  we  successfully  developed  an  ANN  that 
merges  radiologist-extracted  features  to  predict  malignancy.  By  adopting  the 
nationally-standardized  BI-RADS  lexicon  for  encoding  features,  this  ANN  has  the 
potential  to  be  widely  applicable.  Finally,  we  also  developed  an  empirical  technique 
for  identifying  an  optimal  subset  of  input  features  to  a  complex,  nonlinear 
classification  system.  This  technique  can  be  applied  to  any  problem  where  one 
wishes  to  optimally  simplify  a  large  number  of  input  features  for  ANN  development. 

At  the  conclusion  of  the  first  budget  period,  we  have  accomplished  the  stated 
goals  of  the  proposal  for  that  time  period,  or  indicated  our  reasons  for  not  doing  so. 
In  the  next  year,  we  will  continue  in  accordance  with  the  proposal  time  line. 
Specifically,  we  will  conclude  work  on  aim  2a  concerning  conventional  techniques 
for  automated  feature  extraction.  We  will  at  the  same  time  commence  work  on  aim 
2b,  using  ANNs  for  feature  extraction.  If  successful,  the  features  extracted  by 
conventional  or  ANN  methods  will  eventually  be  fed  to  the  features-to-diagnosis 
network  developed  during  the  first  year.  Together,  the  system  will  provide 
automated,  accurate  predictions  of  breast  lesion  malignancy. 
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